COUNTING POINTS ON VARIETIES OVER FINITE 
FIELDS OF SMALL CHARACTERISTIC 



ALAN G.B. LAUDER AND DAQING WAN 

Abstract. We present a deterministic polynomial time algorithm for 
computing the zeta function of an arbitrary variety of fixed dimension 
over a finite field of small characteristic. One consequence of this result 
is an efficient method for computing the order of the group of rational 
points on the Jacobian of a smooth geometrically connected projective 
curve over a finite field of small characteristic. 



1. Introduction 

The purpose of this paper is to give an elementary and self-contained proof 
that one may efficiently compute zeta functions of arbitrary varieties of fixed 
dimension over finite fields of suitably small characteristic. This is achieved 
via the p-adic methods developed by Dwork in his proof of the rationality 
of the zeta function of a variety over a finite field [U [5] . Dwork's theorem 
shows that it is in principle possible to compute the zeta function. Our 
main contribution is to show how Dwork's trace formula, Bombieri's degree 
bound [3] and a semi-linear reduction argument yield an efficient algorithm 
for doing so. That p-adic methods may be used to efficiently compute zeta 
functions for small characteristic was first suggested in [201 E2] , where Wan 
gives a simpler algorithm for counting the number of solutions to an equation 
over a finite field modulo small powers of the characteristic. 

We now give more details of our results. For q = p a , where p is a prime 
number and a a positive integer, let ¥ q denote a finite field with q elements. 
Let ¥ q denote an algebraic closure of ¥ q , and ¥ q k the subfield of ¥ q of order 
q k . Denote by ¥ q [X\ , . . . , X n ] the ring of all polynomials in n variables over 
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For a polynomial / E ¥ g [Xi, . . . ,X n ], we denote by iV& the number of 
solutions to the equation / = with coordinates in F„k. The zeta function 
of the variety defined by / is the formal power series in T with non-negative 
integer coefficients 

Z(//F,)(T) = exp (EnF-V 

Dwork's theorem asserts that Z(f /¥ q ) is a rational function r(T)/s(T) with 
integer coefficients. From this it follows that knowledge of explicit bounds 
deg(r) < D\ and deg(s) < D2, and of the values Nk for A; = 1, 2, . . . , D1 + D2 
is enough to efficiently determine Z(f/¥ q ), see [22j . The Bombieri degree 
bound tells us that deg(r) + deg(s) < (Ad + 9) n+1 , see [31 [22], and so in 
particular we may take L>i = D 2 = (Ad + 9) n+1 . Each number N k can be 
computed in a naive fashion by straightforward counting, using q nk evalu- 
ations of the polynomial /. Thus one may compute the zeta function of a 
variety, but the naive method described requires a number of steps that is 
exponential in the parameters d n and logg, where d is the total degree of 
/. Note that the dense input size of / is 0((d + l)™logg), and the size of 
the zeta function is polynomial in 0((d + \) n \ogq), by the Bombieri degree 
bound. We prove the following theorem. 

Theorem 1. There exist an explicit deterministic algorithm and an explicit 
polynomial P such that for any f S ¥ q [Xi, . . . , X n ] of total degree d, where 
q = p a andp is prime, the algorithm computes the zeta function Z(f /¥ q )(T) 
of f in a number of bit operations which is bounded by P(p n d n a n ). 

In particular, this computes the zeta function of a polynomial in a fixed 
number of variables over a finite field of "small characteristic" in deter- 
ministic polynomial time. Note that our result makes no assumption of 
non-singularity on the variety defined by the polynomial. Also, we shall 
explicitly describe all the algorithms in this paper, rather than just prove 
their existence, and we assume that the finite field ¥ q itself is presented as 
input via an irreducible polynomial of degree a over the prime field F p , as 
explained in Section [3j 

With regard to the exponents in the algorithm, we state these precisely 
in Theorem [371 For now we observe that if p, d and n are fixed then the 
time required to compute the number of points in ¥ q is 0(a 3n+7 ), with 
space complexity 0(a 2n+A ). Here we are ignoring logarithmic factors (see 
also Proposition 136ft , Note that all our complexity estimates are made using 
standard methods for multiplication in various rings, and can be modestly 
reduced with faster methods. 

We also present refinements to this result based upon the ideas of Adolph- 
son and Sperber [2], and indeed from the outset will follow their approach, 
as it involves little extra complication. This refinement takes into account 
the terms which actually occur in the polynomial / rather than working 
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solely with the total degree. We shall need more definitions: the support 
of a polynomial / is the set of exponents r = (n, . . . , r n ) of non-zero terms 
arX 1 ^ 1 . . . X^" which occur in /, thought of as points in M n . The Newton 
polytope of / is defined to be the convex hull in M n of the support of /. 
Our refined version of Theorem Q] essentially replaces the parameter d n with 
the normalised volume of the Newton polytope of / (see Proposition 1351 and 
Section f6.3.3j) . 

Zeta functions may also be defined for a finite collection of polynomials, 
and we next describe how our results may be extended to this case. An 
affine variety V over ¥ q is the set of common zeros in (¥ q ) n of a set of 
polynomials /i,/2, • • • ,fr- An analogous zeta function Z(V/¥ q ) may then 
be defined in terms of the number of solutions in each finite extension field of 
¥g. These numbers may be computed using an inclusion-exclusion argument 
involving the polynomials Y\ ieS fi where S is a subset of {1,2, ... ,r}, see 
|22j . Theorem Q] then easily yields the following. 

Corollary 2. There exist an explicit deterministic algorithm and an explicit 
polynomial Q with the following property. Let fi,...,f r G F 9 [Xi, . . . , X n ] 
have total degrees di,...,d r respectively, where q = p a and p is prime. 
Denote by V the affine variety defined by the common vanishing of these 
polynomials, and define d = Y^i=x^i- The algorithm computes the zeta 
function Z(V/¥ q )(T) in a number of bit operations which is bounded by 
Q(p n d n2 a n 2 r ). 

Thus one has an efficient algorithm for computing the zeta function of 
an arbitrary affine variety over ¥ q assuming the characteristic, dimension 
and the number of defining polynomials are fixed. More generally still, an 
arbitrary variety is defined through patching together suitable affine vari- 
eties. Zeta functions for such general varieties may be defined. These zeta 
functions may be computed using the above ideas provided explicit data 
are given on how to construct them from affine patches. As an example, 
the zeta functions of arbitrary projective varieties or toric varieties may be 
computed in this way. 

Conceptually our algorithm is rather straightforward. The zeta function 
can be expressed in terms of the "characteristic power series" (Predholm 
determinant) of a certain "lifting of Frobenius" which acts on an infinite 
dimensional p-adic Banach space constructed by Dwork. Under modular 
reduction, we obtain an operator acting on a finite dimensional vector space. 
Unfortunately, this operator cannot be computed efficiently directly from its 
definition; however, it can be expressed as a product of certain semi-linear 
operators each of which can be computed efficiently if the characteristic p 
is small. This last step is thus of crucial importance in deriving an efficient 
algorithm. Note that the same idea is used in |20[ [22] in a simpler situation. 
In more concrete language, the algorithm requires one to construct a certain 
"semi-linear" finite matrix and compute the "linear" matrix which is the 
product of the Galois conjugates of the semi-linear matrix. The number of 
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points is then read off from the trace of the final linear matrix. The zeta 
function can be computed from the characteristic polynomial of the final 
linear matrix. The semi-linear matrix itself is defined over a certain finite 
"p-adic lifting" of the original finite field. 

In the literature algorithms have already been developed for computing 
zeta functions of curves and abelian varieties [U EJ El El HE] . These utilise 
the theory originally developed by Weil for abelian varieties, whereas we 
use Dwork's more general and simpler p-adic theory. For example, in the 
case of a smooth geometrically irreducible projective plane curve of degree 
d over a field of size q = p a these algorithms have a time complexity which 
grows as (logq) d , where the constant grows exponentially in the degree 
d. Given an absolutely irreducible bivariate polynomial / of degree d over 
W q , the zeta function of the unique smooth projective curve birational to the 
afiine curve defined by / may be computed in time polynomial in d, p and 
a using our approach. Thus our more general method is far better in terms 
of the degree d if the characteristic p is small, since the running time has 
polynomial growth in d (but much worse if p is large). The zeta function 
immediately gives the order of the group of rational points on the Jacobian, 
see [22]. 

Corollary 3. There exist an explicit deterministic algorithm and an ex- 
plicit polynomial R with the following property. Let V be a geometrically 
irreducible affine curve defined by the vanishing of polynomials fx, . . . , f r £ 
F g pTi, . . . , X n ] of total degrees d\, ■ ■ ■ , d r respectively, where q = p a and p 
is prime. Denote by V the unique smooth projective curve birational to the 
affine curve V, and let d = Yll=i di ■ The algorithm computes the order of the 
group of rational points on the Jacobian ofV in a number of bit operations 
bounded by R{p n d n a n 2 r ). 

Thus one may compute the order of the group of rational points on the 
Jacobian of a smooth geometrically irreducible projective curve over a finite 
field of small characteristic in deterministic polynomial time, provided the 
number of variables and the number of defining equations are fixed. (Notice 
that the case r = 1 and n = 2 corresponds to that of being given a possibly 
singular plane model of the curve.) In particular, this answers a question 
posed in [15], which Poonen attributes to Katz and Sarnak. We note that 
recently a similar result for special classes of plane curves was independently 
obtained in [HI [TO], using Monsky-Washnitzer's method. Also, a different p- 
adic approach for elliptic curves has been developed in |16j . 

This paper is written primarily for theoretical computational interest, in 
obtaining a deterministic polynomial time algorithm for computing the zeta 
function in full generality if p is small. It can certainly be improved in many 
ways for practical computations. What we have done in this paper is to work 
on the easier but more flexible "chain level" . A general improvement in the 
smooth case, is to work on the cohomology level. Then there are several 
related p-adic cohomology theories available, each leading to a somewhat 
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different version of the algorithm. Again, as indicated in [20, 22J, these p-adic 
methods are expected to be practical only for small p. (The authors have 
recently applied Dwork's cohomology theory to derive a practical algorithm 
in a special case [12].) 

2. Additive character sums over finite fields 

The most natural objects of study in Dwork's theory are certain additive 
character sums over finite fields. In this section we introduce these sums, 
and explain their connection to varieties. 

An additive character ^ is a mapping from ¥ k to the group of units of 
some commutative ring S with identity 1 such that *$>(x + y) = ^f(x)^(y) 
for x,y £ ¥ q k. We say that it is non-trivial if ^(x) ^ 1 for some x 6 F k. 
Let {^k}k>i be any family of mappings with each ^ a non-trivial additive 
character from ¥ q k to some extension ring of the integers whose image is a 
group of order p with elements summing to zero. We assume that the family 
{^k} forms a tower of characters in the sense that for each k > 2, 

In our application, the ring S will be taken to be a certain p-adic ring. 

For the remainder of the paper we will use multi-index notation. Specifi- 
cally, we let X u represent the monomial X^'X™ 1 . . . X^ n for an integer vector 
u = (uq, m, . . . , u n ); let x be an (n + l)-tuple (xq, x±, . . . , x n ) of field ele- 
ments; and X the list of indeterminates Xq, X±, . . . , X n . Observe here that 
we have introduced an extra indeterminate Xq. Even though the polynomial 
/ whose zeta function we wish to compute is in the n variables X±, . . . , X n , 
in Dwork's theory the extra indeterminate arises naturally, as we are about 
to see. 

Lemma 4. Let f € F 9 [Xi, . . . , X n ] and N£ denote the number of solutions 
to the equation f = in the affine torus (¥*k) n . Then 

*k(xo/(zi, • • • , *„)) = Q k N* k - (q k - l) n 

^G(ip; fc )™+ i 

where x = (xq, xi,..., x n ). 

Proof. We have that for any u £ ¥ k 

T / n f tfue¥* k 

zoeF^fc 

This is a standard result from the theory of additive character sums |13t 
page 168]. Thus ^k( x ofi x l> ■ ■ ■ 5 x n)), where the sum is taken over points 
x € F^fe x (¥*k) n , equals q h N£. Removing the contribution of (q k — 1)™ from 
the terms with xq = in this sum gives the required result. □ 
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Our next step will be to find an alternative formula for the lefthand side 
of the equation in Lemma [H This is achieved in Proposition 111! which leads 
us eventually to Dwork's Trace Formula (Theorem [26]). 

3. p-Adic Theory 

3.1. p-adic rings. We first introduce notation for the p-adic rings we shall 
need, before explaining how to construct them and compute in them. Let 
Q p be the field of p-adic rationals, and Z p the ring of p-adic integers (see 
[TT]). Denote by Q the completion of an algebraic closure of Q p . Select 
7r € SI with 7r p ~ 1 = — p and define R± = 1i p [ir], a totally ramified extension 
of Z p of degree p — 1. By binomial expansion and Hensel's lifting lemma, 
one sees that the equation (1 + irt) p = 1 has exactly p distinct solutions t 
in R\. In particular, R\ contains all p-th roots of unity. The motivation 
behind the introduction of R\ is that to define an additive character of order 
p, we need a small p-adic ring which contains a primitive pth root of unity. 

Let Rq denote the ring of integers of the unique unramified extension of 
Qp in f2 of degree a, where q = p a . Finally, let R be the compositum ring 
of Ri and Rq. We have the diagram of ring extensions 

tt 
R 

/ \ 
Ro Ri 

\ / 
Zp 

The residue class ring of Rq is F„, and we shall "lift" the coefficients of 
the polynomial / € ¥ q [Xi, . . . , X n ] to this characteristic zero ring, using the 
Teichmiiller lifting. Precisely, the Teichmuller lift uj(x) of a non-zero element 
x € ¥ q is defined as the unique root of unity in the maximal unramified 
extension of Q p which is congruent to x modulo p and has order coprime 
to p. We define u(0) = 0. Thus the compositum R contains both the 
lifting of the coefficients of / and the image of an additive character we 
shall construct. 

3.2. Algorithmic aspects. 

3.2.1. Construction and lifting Frobenius. We assume that W q is presented 
as the quotient ¥ p [y]/(h) where h{y) is a monic, irreducible polynomial of 
degree a over the prime field ¥ p . For any positive integer N we describe how 
the quotient ring R/(p N ) may be constructed: Lift the polynomial h to an 
integer polynomial h whose coefficients lie in the open interval (— p/2, (p + 
l)/2). Take the unramified extension Rq to be Z p [/i] = Z p [y]/(/i). Elements 
in Rq/(p n ) can now be represented as linear combinations over Z p /(p Ar ) 
of the basis elements ... ,/i a_1 . (Recall that 1* p /(p N ) can be identified 
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with Z/(p ).) The extension R/{p ) = Rq[tt]/(p ) is easily constructed by 
adjoining an element tt and specifying the relation 7r p_1 = —p. 

We define a lifting of the Frobenius automorphism on ¥ q to an automor- 
phism of R which is the identity on R\ . Define the map r : R — > R by setting 
r(yu) to be the unique root of h which is congruent to modulo p. Define 
t(7t) = 7r, and extend to the whole of R by insisting r is an automorphism. 
We extend r to act on i?[[A]] coefficient-wise, fixing monomials. Here i?[[A]] 
is the ring of all power series in the indeterminates X = X$, . . . ,X n with 
coefficients from R. 

3.2.2. Complexity of arithmetic. In this section we bound the complexity 
of the basic arithmetic operations in the ring R/(p N ), along with that of 
computing the map r and Teichmuller lifts. Note that these estimates are 
all the simplest possible, and can be improved using more advanced methods. 
The reader may wish to skip the proof of the next lemma, and refer back 
when required in Section 16.3.21 

Lemma 5. Elements in R/(p N ) can be represented using 0{paN\ogp) bits. 
Addition and subtraction can be performed in 0{paN\ogp) bit operations, 
and multiplication and inversion of units in 0((paN logp) 2 ) bit operations. 
The Teichmuller lifting to R/{p N ) of a finite field element can be computed 
in O ((a log p) 3 N 2 ) bit operations. For any 1 < i < a — 1, the map t 1 on 
R/(p N ) may be evaluated using 0(p(aN logp) 2 ) bit operations. (For the 
powers of t we require a total of 0(a 4 iV 2 (log p) 3 ) bits of precomputation.) 

Proof. Elements in R/(p N ) can be written as 



where the coefficients Cjj belong to the ring Z p /(p ) of size p . The bit 
size of such an expression is a(p — l)log(p N ) = 0(paN log p). Addition 
and subtraction are straightforward, just involving the addition of integers 
and reduction modulo p N . Likewise, multiplication of two expansions of the 
form JT]) is straightforward, using the reduction relations 7r p_1 = — p and 
h(n) = 0. 

For Teichmuller lifting and inversion we shall use Newton iteration with 
quadratic convergence. Specifically, we shall use Newton lifting with respect 
to the prime tt in R, and define I = (p — 1)N so that n 1 = (—p) . Given a 
polynomial 4>(Y) £ R[Y] and an element go £ R such that 4>(go) = mod n 
and (ft'(go) is invertible (modulo tt) with inverse sq modulo tt, this algo- 
rithm computes an element g € R/(t: 1 ) such that <p{g) = mod it 1 and 
g = go mod n (compare with Algorithm 9.22]). For i > 1, assuming that 
and Sj_i have been found, we define gi = <?j_i — 4>{gi-i)si-\ mod tt 21 
and Si = 2sj_i — 4>'(gi)s 2 _ 1 mod 7r 2 \ As in Theorem 9.23] one checks that 
gi = go mod tt, 4>{gi) = mod 7r 2 * and Si = (f>'{gi)~ l mod 7r 2 * at the ith step. 



a-l p-2 



(1) 




i=0 j=Q 
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Thus after flog 2 steps we shall have found the required approximate root 
g = (?fiog 2 (01 • step involves three additions/subtractions and three 

multiplications in the ring R/(tt 21 ), which requires 0((2 l a logp) 2 ) bit oper- 
ations, along with evaluation of the polynomials <f> and 4>' modulo ir 2 \ Let 
c((f>, i) be the complexity of these latter operations. Thus the total complex- 
ity is 0((paNlogp) 2 + c{4>)) bit operations where c{4>) = X^I=i 2 ^ C (0><O- 
Note that to compute approximate roots in the ring Rq rather than R, 
one can lift using the prime p rather than tt and obtain a complexity of 
0{{aN\ogp) 2 + c(0)). 

Suppose now we are given a unit x G R/{p N ). We compute a Newton 
lifting starting from an element go £ R/{n) such that xgo = 1 mod tt; that 
is, we use the equation 4>(Y) = xY — 1. Now R/(tt) is just the finite field 
W q and so an inverse go of x modulo 7r can be computed in 0((alogp) 2 ) bit 
operations [7, Corollary 4.6]. (Here <fi'(Y) = x and so = go, and so in fact 
we only need to iterate the formula for Sj.) In this case c(cj),i) is just one 
multiplication and a subtraction modulo 7r 2 \ By the above paragraph the 
total complexity is then 0({paN\ogp) 2 ) for inversion. For Teichmuller lifts 
we use the same approach, only with the polynomial <fi(Y) = Y q ~ x — 1 and 
lifting in Rq via the element p. Here using a fast exponentiation routine 
we find that c((ft,i) involves 0(logq) multiplications and a subtraction in 
Ro/{p T ). Thus c(4>) = 0((aiV log p) 2 log <?) which gives the Teichmuller 
lifting estimate. 

We precompute r on the basis elements 1, /i, . . . , fi a ~ . To do this, re- 
call that t(/j,) £ Ro is defined as the unique root of h which is congruent 
modulo p to This may be approximated modulo p N by Newton lifting 
in Rq with respect to p using the polynomial (ft(Y) = h(Y) and the initial 
value go = [i p . (Finding /x p takes 0(a 2 (logp) 3 ) bit operations and this is 
absorbed in the stated precomputation estimate.) Here c(0, i) is 0{a) addi- 
tions and multiplications in Ro/{p 2% ), using Horner's method for polynomial 
evaluation [2 Page 93]. Thus the Newton lifting estimate gives a complex- 
ity of (a (aiV log p) 2 ) bit operations. Using t(//) = (r(/i))' one can now 
find the image of all basis elements in a further 0(a(aN log p) 2 ) bit opera- 
tions. One stores this information as a matrix for r acting on Ro/(p N ) as an 
7j p /(p N )-m.odule with basis the powers of \i. The map r can now be com- 
puted on any element in Ro/(p N ) in 0{a 2 {N\ogp) 2 ) bit operations using 
linear algebra. By taking powers of the matrix, matrices for the maps t % for 
1 < i < a— 1 can also be found in O (a 3 a(N log p) 2 ) bit operations. Thus the 
total precomputation is bounded by 0(a 4 N 2 (logp) 3 ), and each evaluation 
of t 1 on Ro/(p N ) takes 0((aN logp) 2 ) bit operations. Finally, to evaluate 
r on R/{p N ) one writes elements of R on the i?o-basis 1,tt, ... ,vr p_2 and 
applies r component- wise. □ 

3.3. p-adic valuations and convergence of power series. Denote by 
ord the additive valuation on Q normalised so that ord (p) = 1 . Thus 
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ord(7r) = l/(p— 1). Define a p-adic norm \.\ p on f2 by \x\ p = p~~ 0T ^( x \ The 
set of all x G f2 with \x\ p < 1 (equivalently ord (x) > 0) is called the closed 
unit disk. Given any formal power series ^Z r A r X r where A r £ £1, we say it 
converges at a point x = (xo, . . . ,x n ) € VL n+l if the sequence of partial sums 
|r|<e A r x r tends to a limit under the p-adic norm. (Here |r| = X^=o r *-) 
This sequence of partial sums will converge if and only if the summands 
A r x r tend to zero p-adically (that is, are divisible in the ring of integers of 
$7 by increasingly large powers of p), as |r| goes to infinity. In particular, 
if there is a real number c > such that ord (A r ) > c\r\ for all r, then the 
series will certainly converge for all points x which are Teichmiiller liftings 
of points over ¥ q . Note that throughout the paper we shall use the additive 
valuation ord rather than the p-adic norm | . \ p itself. 

4. Analytic representation of characters 

4.1. Dwork's splitting functions. We now need to find a suitable p-adic 
expression for a non-trivial additive character from ¥ q to R\. In the case of 
complex characters, this is done via the exponential function. However, the 
radius of convergence of the exponential function in 0, is not large enough; 
in particular, it does not converge on the Teichmiiller lifting of all the points 
in W q . Instead we use the power series constructed by Dwork using the 
exponential function. (The reader may find the discussion of the related 
Artin-Hasse function on [HI Pages 92-93] helpful.) 

Let [i be the Mobius function. Taking the logarithmic derivative, one 
checks that the exponential function has the following product expansion 

OO oo 

e^(*) = £^ = n( 1 -A"' i( * )/ *. 

k=0 ' k=l 

This can be rewritten as 

oo 

exp(z) = \\ (1 - z k)-M*)/*(l _ z k P ^(k)/(k P ) _ 
(M=i 

It follows that 

P ' (M=i 

Replacing z by nz in the above relation and noting that ir p = —pir, we define 
a power series in z by 

9{z) = exp(7Tz — irz p ). 

Writing 9{z) = ^2^ = Q\z r we see that A r = 7r r /r! for r < p, and we shall 
shortly show that all A r lie in R\. From ([2]) we get the product expansion 

oo 

(3) 6{z)= Yl (1 -7r k z k )-^/ k (l -^ v \kv 2 y{k)l{kp?)_ 

(k, P )=l 
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By the binomial expansion, the first factor 

(i -^ z k r m/k = j2(-iy(~^ k)/k y^z^ k = f>(* 

3=0 3=0 

is a power series in z k . Now for (k,p) = 1, we have [111 Page 82] 

Thus for j > the coefficient of z jk satisfies 

ord(6 j (fc))>^->^ 
p — 1 p 2 

Similarly, for the second factor, we write 

00 /,.(U\IUJZ 



„— n V J / -_ n 



3=0 " J ' 3=0 

Now for j > we have ord < see [11] Page 79]. Thus for j > and 
& positive and coprime to p, the coefficient of z^ kp2 satisfies 



ord^fcp 2 )) > i—?— - 2j 3 — > 1—Y^jkp 2 . 

p — 1 p — 1 p z 



Putting the above two inequalities together, we conclude that for r > the 
coefficients A r of 9{z) satisfy 

(4) rd(A r )>^^, A r ei?i. 

This shows that the power series 9{z) is convergent in the disk \z\ p < 1 + e 
for some e > 0. In particular, 9{z) converges on the closed unit disk, and 
Definition [6] makes sense. 

In the proof of the next lemma we shall use the fact that ord(A r ) > 
2/(p — 1) for r > 2, and so 

(5) 9(z) = 1 + (irz) mod (ttz) 2 . 

This can be seen as follows: The Artin-Hasse exponential function [11, Page 
93] 



oo 



E(z):= I] (l-z fc )-^)A = eX p + + 

(4=1 V p p J 

has coefficients in Z p , because each factor in the product expansion does. 
Since E{-kz) = 9(z) mod z p we see 

r 

ord (A r ) > 

p — 1 

for < r < p 2 . This estimate combined with ([4]) gives ([5]). 
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Definition 6. [Dwork's splitting function] Let 

ak—l 

**(*) = II e Ri[[z]], 

i=0 

and 

where u is the Teichmiiller map. (Recall that q = p a .) 

(That has image in R\ can be seen as follows: Let R\[u(F q kj\ be R\ 
adjoined the image of uj on ¥ q k. Then Ri[u(¥ *>)] is an unramified extension 
of R\ of degree k. The Galois group of the corresponding quotient field 
extension is generated by t. The map r acts on a Teichmiiller point uo(x) 
as t(lo(x)) = u>(x) p . Hence it fixes the element ^k(x) for x G ¥ q k, and so 
^k(x) G Ri.) 

Lemma 7. The maps form a tower of non-trivial additive characters 
from the fields ¥ k to the ring R x . 

Proof. (This is the case "s = 1" on [3, Pages 55-57].) We first show that 
6(1) is a primitive pth root of unity. By ([5]) we see 8(1) 7^ 1. As a formal 
power series in z, 

6(z) p = exp(pirz) exp(— pirz p ). 

Now, 6(z), exp(pirz) and exp(— pirz p ) are all convergent in \z\ p < 1 + e 
for some e > 0. We can thus substitute z = 1 and find that 9(l) p = 
exp(p7r) exp(— pn) = 1. Thus 8(1) is a primitive p-th root of unity in R\. 
Next, for 7 G Rq with 7 p<lfc = 7, we claim that 

ak-i k _ i 

Y\ 8(j pl ) = 0(l)7+7 p +-+7 pa . 
i=0 

Using ([5|) it is clear that both sides are congruent to 

1 + 7[ ( 1 + 1 p + ... + 1 P ak ~ 1 ) 

modulo 7r 2 . To prove the claim, it remains to prove that both sides are p-th. 
roots of unity. The right side is a p-th. root of unity since 0(1) is a p-th. root 
of unity. The p-th power of the left side is 

ak— 1 ak—l 

Y[ 8("f p ) p = exp(p7r (l P ~ l v )) = exp(p7T7) exp(— pirj 13 ) = 1. 

i=0 i=0 

Thus, the left side is also a p-th root of unity. The claim is proved. Note 
that the individual factor #(7) is not necessarily a p-th root of unity. We 
conclude that 

q, k ( x ) = fl(i)Tr fc (*) 

for any x G F g fc, where Tr& is the trace function from ¥ k to F p , and the 
exponent is thought of as an integer. □ 
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Note 8 The infinite sum 

exp W z-z>,)=f;Wi^ 

k=0 

is convergent for \z\ p < 1, but not necessarily convergent for \z\ p = 1. If 
\z\ p = 1, then it is possible that \z — z p \ p = 1 and for such z the above 
infinite sum does not converge. Since the infinite sum does not converge 
everywhere on the disk \z\ p < 1, one cannot simply substitute z = 1 into 
the above infinite sum and get the contradiction that 8(1) = 1. There is no 
contradiction here! 

We now define another power series related to our original polynomial / 
whose relevance will become apparent in Proposition II li 

Definition 9. Let f be the polynomial whose zeta function we wish to com- 
pute, and write 

X f = J2*3 Xj ' 

where J is the support of X$f . Let aj be the Teichmuller lifting ofa.j. Let 
F be the formal power series in the indeterminates X with coefficients in R 
given by 

F(x) = Y[e( aj xi). 

Let (X) be the formal power series in the indeterminates X with coeffi- 
cients in R given by 

a-l 

FW(i)=nn»((^ j ) ps ). 

jeJs=o 

The relation between F(X) and F^ a \X) is clear. 

Lemma 10. Let the power series F^ and F be as in Definition^ Then 

a-l 

F^(X) = Yl r i (F(X pi )) 

i=0 

where the map r acts coefficient-wise on the power series F . 

The power series F^ (X) relates to rational point counting in the follow- 
ing way. 

Proposition 11. Let f G ¥ q [Xi, ...,X n ] and F^ G R[[X]] be as in Defi- 
nition Denoting by N£ the number of solutions to the equation f = in 
the affine torus (¥* k ) n , we have 

q k N* k - (q k - l) n = ^ F ( - a \x)F^\x q )...F^(x qk ~ 1 ) 
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where the summation is over the Teichmuller lifting of points on the torus 

(F^r +1 - 

Proof. For any point x in (¥ qk ) n+1 with Teichmuller lifting x we have that 
^k(x f(xi,...,x n )) = ^k(^Zj e jajX 3 ) 

= rw*fc(%^') 

= nt"o{n ie j(n:=o^((4^ v ) pS ))> 

= F^(x)F^(x ( i)...F( a \x clk 1 ), 

where F^ a '(X) is given by 

a-l 

j€Js=0 

(We pause to justify the steps above: the first four equalities follow straight 
from definitions and from the homomorphic property of *&k', the fifth and 
sixth by rearrangement; and the seventh since dj satisfies a q - = a,j.) 
Thus we have 

J2 * fc (x /(xi,...,x n ))= Yl F^(x)F^(x^)...F^(x^- 1 ) 

where the latter sum is over the Teichmuller lifting in f2 n+1 of points in 
(F*fc)™ +1 . Combining this with Lemma [4] gives us the result. □ 

4.2. Decay rates and weight functions. We now describe the decay 
rates of the coefficients of the power series and F. Specifically, 
obtain lower bounds for the p-adic order of the coefficients of the power 
series F expressed in terms of a certain weight function on integer vectors. 

Write F = ^ r F r X r where the sum is over non-negative integer vec- 
tors in Z^q 1 . Let A be the (n + 1) x \J\ matrix whose columns are j = 
(iojji) • • • > 3n) £ J- Then from Definition [9] one sees 

(6) ^=E(n A «, a ?) 

u jeJ 

where the outer sum is over all | J|-tuples u = (uj) of non-negative integers 
such that 

(7) Au = r, 

thinking of u and r as column vectors. Since jo = 1 for all (jo, . . . ,j n ) € J, 
the first row of the matrix A is the vector (1, !,...,!). The first equation 
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in the above linear system is then 




Now F r is zero if ((ZJ) has no solutions. Otherwise, since ord (A^a" 3 ) 
ord (X Uj ) from ©,© and © we get 




where the inf is over all non-negative integer vector solutions u of ([7]). We 
now define a weight function w such that ([9]) gives estimates on ord (F r ) in 
terms of this weight function. 

Let 5± C R n denote the convex hull of the support of / (the set of expo- 
nents of non-zero terms). Let 62 C R n be the convex hull of the origin and 
the n points (d, 0, . . . , 0), (0, d, . . . , 0), . . . , (0, . . . , 0, d) where d is the total 
degree of /. We call 8\ the Newton polytope of /; the polytope 82 is just a 
simplex containing 8±. 

Definition 12. Let 5 be any convex polytope with integer vertices such that 
Si C S C 62- Denote by A the convex polytope inW n+1 obtained by embedding 
8 in R™ +1 via the map x 1— ► (1, sc) for x S R n , and taking the convex hull 
with the origin. Denote by C(A) the cone generated in R n+1 as the positive 
hull of A. So C(A) is the union of all rays emanating from the origin and 
passing through A. 

Ultimately, in Sections 16.3.31 and 16.41 we shall only be interested in the 
simplest choice of polytope 5 = 62 (although the choice 8 = 81 leads to the 
most refined algorithm). Letting Ai denote the polytope in R n+1 obtained 
by choosing 8 = S± we see that C(Ai) is the cone generated by the exponents 
of non-zero terms in X^f. Equation (|7|) has no non- negative integer (or even 
real) solutions when r does not lie in C(Ai). Thus for any choice of <5Q <5i) 
and corresponding A(^ Ai), all exponents of F lie in the cone C(A). 

Definition 13. Define a weight function w from R" - " 1 " 1 to R U {00} in the 
following way: For r = (tq, r%, . . . , r n ) G R n+ define 



In particular w(r) is a non-negative integer for any r G C(A) n U 1 

Note 14 Choosing 5 = 8\ corresponds to working with the weight function 
of Adolphson and Sperber [2], and taking 5 = 62 to Dwork's original weight 
function [4j. Note that Dwork's weight function can equivalently be defined 
as w(r) = rQ if n + • • • + r n < r$d and 00 otherwise, where d is the total 
degree of /. 
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The weight function has a simple geometric interpretation: For a real 
number c define cA = {cx \ x G A}. The next lemma is straightforward. 

Lemma 15. For any point r G C(A) we have that w(r) is the smallest 
non-negative number c such that r G cA. If r C(A) then w(r) = oo. 

By ([9]) and the sentence preceding Definition [T3l we have the following 
result. 

Lemma 16. We have the inequality 

p — 1 

ord (F r ) > w(r) — — . 

We also note the following simple property whose proof is straightforward. 

Lemma 17. Let r, r' G M n+1 and k a non-negative integer. Then w{kr) = 
kw{r) and w(r + r') < w(r) + w{r'). In particular when w(r') ^ oo, 

w(kr — r) > kw{r) — w(r'). 

We shall work in certain subrings of defined in terms of the weight 

function. 

Definition 18. Define La to be the subring of R[[X]] given by 

L A = { ^ A r X r \A r €R}. 

reC(A)nZ n+1 

Thus, La is just the ring of all power series over R whose terms have 
exponents in the cone C(A). Certainly F G La and from Lemma [TOl we see 
easily that F^ G L A . 

Lemma 19. The power series F and F^ both belong to La- 

This concludes all results in this section which shall be essential to the 
proof of our modular version of Dwork's Trace Formula. We conclude with 
a definition and some comments which we will refer to in the analysis of the 
running time of our algorithm. 

Definition 20. For any positive real number b, we define a set of power 
series by 

La (&) = (^A r X r G L A | ord A r > bw(r)}. 

r 

The set La (b) is easily seen to be a subring of La- Elements in La (b) 
for large b can be thought of as having fast decaying coefficients. Such rings 
will reduce to rings of small dimension modulo small powers of p. 

We have that 

' p — 1 



(10) F G La 

(11) G L a 



p2 
p — 1 

qp 
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The first is immediate from the inequality in Lemma [16] and the second 
follows since r i (F(XP i )) € L A ({gi) for each < i < o - 1. Thus for a > 1 
the coefficients of decay more slowly than those of F itself. 

5. Dwork's trace formula 

5.1. Lifting Probenius. We now introduce Dwork's "left inverse of Frobe- 
nius" mapping ip p on the ring i?[[X]]. 



Definition 21. Let ijj p be defined on the monomials in R[[X]] by 



X r /P if p \ r 
otherwise 

and extend ip p by r _1 -linearity to all of R[[X]]. That is 

i; p (£ A r Xr ) = E T_1 (^)^( Xr ) = J2r-\A pr )X r . 

r r r 

Here p\r means that p divides all of the entries in the integer vector r. 
This map is a left inverse of the "Frobenius" map on the ring i?[[X]] which 
takes a power series ^2 r A r X r to ^ r T(A r )X pr . 

Definition 22. Let a a be the map from R[[X]] to itself defined as 

Precisely, this is the map which is the composition of multiplication by the 
power series F^ followed by the mapping ipp on the ring R[[X]]. (Notice 
that 2p p just acts as 

r p (J2 A rX r ) = Y, A ^ xr 

r r 

since T~ a {A r ) = A r .) Let the map a from R[[X]] to itself be defined as 

a = tp p o F. 

Thus a is multiplication by F followed by the mapping ip p . 

We have the following result relating these two maps, which shall be of 
crucial importance in our derivation of an efficient algorithm. 

Lemma 23. With a a and a as in Definition we have 

a a = a a 

where the second exponent is a power under composition. 

Proof. Firstly let H G and denote by ip p o H{X P ) the map composed 

of multiplication by H{X P ) followed by ip p . We claim that 

(12) i; p oH(X' p ) = T- 1 (H(X))oip p 
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To see this write H = H r X pr . Then we have that 

% o h(xp) = t-^hMp o xn. 

r 

Here the infinite series is interpreted as a mapping. Now ip p o X pr = X r o tp p 
as these two maps are r _1 -linear and agree on monomials. Hence we have 
r/j p o H(XP) = J2 r T~ l (H r )(X r o J, p ) = t- 1 {H{X)) o 

Next we claim that for any b > 1 and power series H we have 

6-1 

^oJ{r\H{X pl )) = ^ p oH{X)) h . 
i=0 

We prove this by induction, the result trivially holding if b = 1. For b > 1, 
by b — 1 applications of (fT2|) we get 

6-1 6-2 

^ o HAHiX^)) = o H(X)) o (^J-l o JJr*^^)). 
i=0 i=0 
The second claim then follows by induction. Putting H = F and b = a we 
get the required result. □ 

The map a a is linear and continuous, in the sense that 

a a {Y,A r X r ) = Y,A r a a {X r ) 

r r 

for any element ^ r A r X r € The map a is r _1 -linear and continuous, 

in the sense that 

a(J2 A rX r ) = T- l {A r )a(X r ). 

r r 

From Lemma [19] the next lemma follows easily. 

Lemma 24. The subring is stable under both a and a a . 

Both maps when restricted to the subring La are determined by their 
action on the monomials X r (with w{r) < oo), which we now consider. 

5.2. Matrix representations of mappings. Recall that C( A) is the cone 
in ]R n+ from Definition 1121 The set 

r A = {x u \ueC(A)nZ r § ) 1 } 

written as a row vector, is a formal basis for the space La- Precisely, this 
means that any power series in La may be written in exactly one way as 
an infinite sum Y^ u A U X U with X u in the above set. Notice that this is 
different from the usual notion of a basis in linear algebra, since we allow 
infinite combinations of basis elements. It is also different from the notion 
of an "orthonormal basis" in the literature, where one requires that the 
coefficient A u goes to zero as w(u) goes to oo (see [21] for a more detailed 
discussion of these notions). 
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By Lemma [231 both a and a a send the ring La to itself. We define certain 
matrices associated to the maps a and a a restricted to La with regard to 
the formal row basis T A of monomials. 

Definition 25. Let the infinite matrices M and M a have columns describing 
the images of the monomials X'" £ La under the maps a and a a with respect 
to our formal row basis T a •' 

a(r A ) = T A M, a a (T A ) = T A M a . 

Specifically, the (u,v)th entries of M and M a for u,v £ C(A) are m uv and 
rriw respectively where 



m 



uv ' V,- 1 pu 

(a) _ F (a) 
iii>uv — 1 qu—V 

Here F^> = ^ r iv Z r and as before F = ^ r F r X r , and we take the 
coefficients of exponents r with negative entries to be zero. 

(Note that as yet we have not ordered the basis; however, we shall choose 
a convenient ordering in the proof of Theorem 1281) We have by Lemmas [TBI 
and[I7] 

ord (Fpu-t,) > —^-w(pu-v) 
(13) > (w(u) - -w(v) 



V V P 

and certainly ord (m uv ) = ord (r _1 (F pu - V )) = ord (F pu - V ). 

Notice that the matrix powers and M k are defined for every positive 
integer k since the entries in M%, say, are just finite sums of the entries in 

M a and M^~ l . This follows for M a , say, since all entries rriuv in M a are 
zero when the vector qu — v contain negative entries. We define the trace 
of an infinite matrix to be the sum of its diagonal entries, when this sum 
converges, and oo when the sum does not converge. We shall see shortly 
that the trace of the infinite matrix M k is finite. We write this as Tr(M^). 
We can now prove the following result of Dwork. 

Theorem 26 (Dwork's Trace Formula). Let f € ¥ q [Xi, . . . , X n ] and let 

a a be the mapping on the ring R[[X]] given as a a = F^ a \ where F^ 
and tpp are described in Definitions and \21\ Let M a denote the infinite 
matrix representing the map a a restricted to the subring L A as described 
in Definition Wh\ For k > 1, denote by N£ the number of solutions to the 
equation f = in the torus (F q k) n ■ Then 

{qk _ ir+ i Tr(M fc } = g k N * _ {q k _ 1)n 

Proof. (The following is a hybrid of the matrix proof given in [19j and 
Dwork's original argument [4].) 
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By Proposition [TT] we have that 

q k N% - (q k - l) n = T, xq k-^ 1 F { - a \x)F^{xl)...F^\ X l k ' 1 ) 

where the sum is over all (n + l)-tuples of (q k — l)st roots of unity in Q 
(namely the Teichmiiller lifting in f2 n+1 of points on the torus (F* fc ) n+1 ). 

We first consider the case k = 1. Since F (a) e L A , we can write 
Y jr F^ a) X r where the sum is over all lattice vectors r which belong to C(A). 
Then the latter sum is 

= (9-i) n+1 E^i)| r ^ (a) 

= (</-ir +1 £ s it-i) S 

= (q - ir +1 Tr(M a ). 

Here by (q — l)|r we mean that the integer q — 1 divides every entry in the 
vector r. Also, we use the fact that for any integer rj [HI page 120] 



E 



x n = \ ?-l if (<7-l)|ri 
^ otherwise. 



For fc > 1, define M afc to be the matrix for the map ip^ k o ~[\ k = Q (X ql ) 
with respect to the formal basis Ta- Then by an analogous argument to 
that in the case k = 1 we see 



k-l 

£ Y[F { - a Xx*) = {q k -l) n+1 Ti{M ak} 



x qk-l = 1 1=0 

Using (I12p in the proof of Lemma [23] one sees that 

k-i 

° n i?(a) (^ fc ) = w£ ° ^ (o) 

i=0 

Since both maps are linear it follows that M a fc = M k , and the theorem is 
proved. □ 

To compute the trace of the matrix in Dwork's formula we shall use the 
following matrix identity derived from Lemma [231 

Lemma 27. Let M a and M denote the matrices for the maps a a and a 
described above. Then 

a-l 

M a = Yl t-\M) = MM T ^ ■ ■ ■ M r " (a_1) 

i=0 

where the map t~ 1 acts entry wise on the matrix M. 
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Proof. First suppose that N% and N2 are matrices representing maps (3\ and 
P2 on some subspace of i?[LY]]. We assume that f3\ is r _1 -linear and /?2 is 
r~ J -linear for some j G Z. Then it is not difficult to prove that the matrix 
for the t~^ +1 ^ -linear map f3\ o /3 2 is just N\T~ l (N2). 

By Lemma [23] we have a a = a a . We claim that for any positive integer 
6 the matrix for the map a b is YiiZo i~~ i (M). The result is trivially true if 
6=1. For 6 > 1 we have a b = a o cr By induction the matrix for 
is T ~ l {M)- By r _1 -linearity of a it follows from the observations in 

the preceding paragraph that the matrix for a b is -MT _1 (lli=o T ~ l (M)) = 
YtiZo t^ 1 (M). The required result now follows by taking 6 = 0. □ 

We note in passing that M a in Lemma [27] also equals 

T *-i(S) r a - 2 (S)...S, 

where S = r(M) is the matrix with (u,v)th entry simply F pu _ v . Although 
this is slightly more desirable from a practical point of view we shall not use 
this expression for M a . 

5.3. Modular reduction of the trace formula. We now examine the 
reduction of the Dwork trace formula modulo a power p N of p. Observe 
that both sides in the trace formula, and all the entries in the matrices M 
and M a , are elements of R, and so this reduction is defined. 

We first recall some notation: Let C(A) be the cone in IR n+1 from Defi- 
nition 112} and La denote the ring of power series over R whose monomials 
have exponents lying in C(A) (Definition PT8j) . Let M denote the matrix 
for the r _1 -linear map a = tp p o F with respect to the formal row basis Ta 
(Definition I25p . Here ip p is the "left inverse of Frobenius" given in Definition 
[21] and F is the power series obtained from the polynomial / as in Definition 

El 

Theorem 28. Let N denote any positive integer and An the finite square 
matrix over the finite ring R/(p N ) obtained by reducing modulo p N all those 
entries in M whose rows and columns are indexed by vectors u € C(A) n 
Z^ 1 with w(u) < (p/(p - l)) 2 N. Then 

a-l 

(q k - ir +1 Tr«;Q r-*^))*) = q k N* k - (q k - If mod p N 

i=0 

where N? is the number of solutions to the equation f = in the affine torus 
(¥* qk ) n . Moreover, the size of A N isW = #(tA), where t = \p 2 N/(p-l) 2 ] - 
1 and #(tA) is number of lattice points in a dilation by a factor t of the 
polytope A. 

Proof. For any finite or infinite matrix L with coefficients in R, we define 
L to be the matrix obtained by reducing all its entries modulo p N . Thus L 
has entries in R/(p N ). 
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The theorem will follow from the Dwork Trace Formula (Theorem [26]) 
once we find a suitable expression for the reduction modulo p N of the trace 
of the matrix M k . This is equal to the trace of the matrix M k = (M a ) k . 
By Lemma [27] this matrix can be computed as a matrix product from the 
matrix M. 

By inequality (fl"3]) every entry m uv = T~ 1 {F pu _ v ) in M satisfies 



Define t to be the greatest integer less than (p/(p — 1)) 2 N. When w(u) > 
w(y) and w(u) > t we have 



Recall that we have not yet ordered the basis. Now choose any total order- 
ing on the basis set such that for distinct lattice points u, v € C(A), the 
monomial X v comes before X u if w{v) < w(u). Thus, by the inequalities in 
(|14p and the choice of ordering of the basis, the matrix M is of the form 



where Cm is a strictly upper triangular infinite matrix and An is the finite 
square matrix indexed by lattice points u £ C(A) such that w(u) < t. The 
size of An is the number of lattice points u with w(u) < t. By Lemma [T5l 
this is exactly the number of lattice points in the polytope tA. 
By Lemma [271 and modular reduction, one has 




(14) 




(15) 




(M a ) k = (]jT-\M)) k . 



i=0 




where C' N = (n^o r~ i {C N )) k is strictly upper triangular. 

Hence the trace of (M a ) k equals the trace of the finite matrix 



(16) (Hr-\A N )) k . 



The theorem now follows from (|16|) and Theorem 1261 



□ 
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6. Algorithms 

In this section we present an algorithm for counting points based upon 
Theorem [28| and complete the proofs of the results in the introduction. 
This is a relatively straightforward matter, although the precise complexity 
estimates require a little care. 

6.1. Toric point counting algorithm. We first give the algorithm. 
Algorithm 29 Toric point counting 

Input: Positive integers a, k, n, d and a prime p; a polynomial /; a polytope 
A. (Here / is a polynomial in n variables of total degree d with coefficients 
in the field ¥ q where q = p a . The polytope A is as in Definition [T2l We 
assume a model of ¥ q is given as in Section [3.2. 1[ ) 

Output: The number of solutions N£ to the equation / = in the torus 
Step 0: Set N = (n + l)ak, where q = p a . 

Step 1: Compute the polynomial F mod p N in the ring (R/(p N ))[X] where 
F is the power series in Definition [9l Specifically, writing Xq/ = J2jeJ ®jX J 
we have F = 9{ajX^) vaodp N . Here a,j is the Teichmiiller lifting of 

dj and 9{z) = exp(7r(,z — z p )). (See Section [3] for a description of the ring 
R/ip ) an d the element ir.) 

Step 2: Construct the matrix A n which occurs in the statement of Theo- 
rem [28j Specifically, the matrix An is indexed by pairs (u,v) where u and 
v are lattice points in the dilation by a factor t of the polytope A, and 
t = \p 2 N/(p - l) 2 ] - 1. The (u,v)th entry of A N is r of the coefficient 
of X pu ~ v in the polynomial F mo&p N . The action of r is as described in 
Section [3l 

Step 3: Compute the product (n^o r_ *(A/v)) fc • Let T denote the trace of 
this product. 

Step 4: Output 

N* k = q- k [((q k ~ l) n+1 T + (q k - l)")mod p N ], 

where the square brackets denote the smallest non-negative residue modulo 
p N . 

In the algorithm we assume that the polynomial is presented as input 
explicitly via its list of non-zero terms. The manner of presentation of A 
only affects the time required to find all lattice points in the dilated polytope 
tA. For concreteness, let us say it is presented via its list of vertices, although 
any other reasonable presentation would suffice. 
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6.2. Proof of correctness of the algorithm. We know by Theorem 1281 
that 

a-1 

q k N* h - (q k - l) n = (q k ~ l) n+1 TY((II r^A^f) mod p N 

i=0 

in the ring R/(p N ). Thus, 

q k Nt = (q k - l) n + (q k - l) n+1 T mod p N . 

The lefthand side is a non-negative integer. The first term on the righthand 
side is an integer. The second term on the righthand side is the reduction 
modulo p N of the trace of M k , which is known to be an integer by Dwork's 
Trace Formula (Theorem [26|h and thus it is also an integer. Since N£ < 
(q k — l) n it follows that the lefthand side is smaller than q k ( n+1 \ Hence in 
the case N > ak(n + 1) we must have 

q k N* k = [((q k - 1)"+^ + (q k - l) n )mod p N ). 

The proof is complete. 

6.3. Complexity analysis. We shall use Big-Oh and Soft-Oh notation in 
our analysis of the complexity of the above algorithm. If C\ and C 2 are 
real functions we write C\ = OiCq) if \C\\ < c(|C*2 1 + 1) for some positive 
constant c. We write C x = 6(C 2 ) if C\ = 0(C 2 log(|C 2 | + l) c ') for some 
constant c'. Thus in the latter notation one ignores logarithmic factors. 

6.3.1. Ring operations. We shall first of all count the number of operations 
in the ring R/(p N ) required in Steps 1, 2, 3, ignoring for the time being any 
other auxiliary computations. More precisely, because of the complexity 
bounds in Lemma [5] it is convenient for our analysis to define a "ring opera- 
tion" to be either arithmetic or the evaluation of the map t 1 , for 1 < i < a— 1, 
in the ring R/(p N ) (excluding precomputation). In Section 16.3.21 we shall 
add back in the small contribution from computing Teichmuller liftings and 
also the precomputation required for the maps t\ Similarly, in Section [6.3.31 
we shall account for the remaining operations in Steps 1, 2, 3, arising mainly 
from computations with the exponents of polynomials (at this stage we will 
restrict the input polytope A to avoid complications from convex geometry) . 
The contributions from Steps and 4 are easily seen to be absorbed into 
the other estimates, and we shall not mention them again. 

Our running time will be in terms of the parameters i, t, W, W. Here 

*=r(^r) 2 ^i-i, w=#(tA), 

where N = ak(n+l) and the # operator counts lattice points in convex sets. 
The sizes of the sets W and W are the number of lattice points in certain 
"truncated" cones. Since t is about p times as large as t, the integer W will 
be around p n+1 times as large as W, since we are working in n+1 dimensional 
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space. The integer W is precisely the size of the matrix which occurs in Step 
2. The integer W will turn out to be the maximum number of terms in the 
polynomial we compute in Step 1. Define La((p — l)/p 2 ) mod p N to be the 
ring of polynomials obtained by reducing the coefficients of power series in 
La{(p — l)/p 2 ) Q modulo p . Then W is the number of monomials 

which occur in the finite ring L A ({p — l)/p 2 ) mod p N . 
For Step 1 we have the following estimate. 

Lemma 30. Let F be the power series given in Definition^ and N a positive 
integer. The polynomial F mod p N may be computed in 

0(\J\W 2 ) 

operations in the ring R/(p N ). 

Proof. By @ we have that 9{z) mod p N is a polynomial of degree not greater 
than p 2 N/(p - 1). Thus we can obtain 9{z) via the formula 

9{z) = exp(7r,z) exp(— ttz p ) 

by computing the first 0(pN) terms in the expansion for exp(-7T2i), substi- 
tuting z = —z p , and one multiplication of polynomials. Note that exp(7rz) 
has p-adic integral coefficients. Thus 9(z) can be found in time 0((pN) 2 ) 
operations in the ring R/(p N ) using standard polynomial arithmetic. 

Now by CEO]) we have that F G L A ((p-l)/p 2 ). Thus F mod p N G L A ((p- 
l)/p 2 ) mo&p N . One may then compute F m.odp N directly from Definition 
|9] in |J| — 1 multiplications of polynomials of the form 9(ajX^) modp N . 
Each such polynomial lies in the ring La((p — 1) / p 2 ) mod p N , because 
ord(aj) = 0,w(j) = 1 and the coefficients of 9 decay at a suitable rate. 
Hence all computations required in computing F mod p N involve polynomi- 
als in this ring. Such polynomials have at most W terms. Exactly | J\ — 1 
multiplications are required. Thus the complexity is 0(|J|iy 2 ) ring opera- 
tions. Noting that pN = 0(W) we have the result. □ 

Note 31 It is crucial here that we only need to compute F mod p N and 
not mod p , as one might attempt to do using a more naive approach. 
The latter polynomial has very high degree (0(q)) because of the slow decay 
rate of the coefficients of F^ . 

With regard to Step 2, given that the polynomial F mod p N has already 
been computed the only task required is to identify those pairs of points 
(u, v) such that u, v G tA, compute the integer point pu — v, and copy 
t -1 of the term F pu _ v mod p N from F mod p N into the correct position in 
the matrix. Thus no arithmetic operations in the ring are required here, 
except W 2 computations of These arithmetic operations can 

safely be ignored since in Lemma [30] we have already counted 0{\ J\W 2 ) 
ring operations. Computation of the appropriate indices (u, v) does require 
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one to find all lattice points in certain polytopes and we return to that in 
Section 16.3.31 

Finally, for Step 3 we have the following estimate. 

Lemma 32. With the notation as in the statement of Theorem \28l the 



operations in the ring R/(p N ). 

Proof. A fast square-and-multiply style algorithm may be used to compute 
the power a a in O(loga) "matrix ring operations". Specifically, working 
with the matrix representations one may compute a r+s from a r and a s 
using 



Now the case r = s = 2 C for some c gives us the "square" step (comput- 
ing a 2C+1 from a 2C ) and the case r = 2 C and s arbitrary the "multiply" 
step. These two operations may be combined to give a fast exponentiation 
method in a straightforward way. The time required to compute a matrix 
for a a from one for a is thus O(logo) "matrix ring operations". By matrix 
ring operations we mean multiplication of matrices of size W over R/(p N ), 
and also computing t~ 1 (B) for some 1 < i < a — 1 and matrix B of this 
form. The former requires 0(W 3 ) operations in the ring R/(p N ) using stan- 
dard algorithms. Since r~ % = r a ~ l the latter may be computed in 0(W 2 ) 
ring operations (that is, applications of a power of r). Thus the time for 
computing a matrix for a a from one for a is 0(W 3 log a). 

Having obtained a matrix for a a one may then compute a matrix for a ak 
using the standard square-and-multiply algorithm. This requires O(logfc) 
matrix multiplications, that is 0(W^\ogk) operations in R/(p N ). Thus the 
total time required is as claimed. □ 

Note that the exponent 3 for multiplication of matrices can be improved 
to around 2.4 using faster methods [TJ page 330]. 
Gathering these results we find the following. 

Lemma 33. The running time of Alaorithm\29\ is 



ring operations. Here \J\ is the number of non-zero terms in f , and W and 
W are defined as at the start of Section \6.3.1[ with N = ak(n + 1). 




(Hr-\A N )) k 



i=0 

can be computed given the matrix An in 



0(W 3 log (ak)) 




0(W 2 \J\ + W 3 log (ak)) 
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6.3.2. Bit complexity arising from ring operations. Using Lemma[5]one may 
now calculate the number of bit operations in the algorithm which arise 
from operations in the ring R/(p N ). In this section we shall also count the 
small contribution from computing the Teichmiiller lifting of the coefficients 
of /, and also the precomputation required for powers of r, which it was 
convenient to ignore in Section [6.3.11 

Definition 34. Let the polytope A from Definition EH have dimension n < 
n + 1. Let V(A) be the n- dimensional volume of A. Denote by v = h\V{A) 
the "normalised" volume of A. 

Since / is non-zero we have n > 1 and certainly v > 0. 

Proposition 35. The running time of Alaorithm\29\ is 

0(a 3n+7 k 3n+5 n 3n+5 v 3 p 2n+4 ) 

bit operations, plus the contribution from operations outside of the ring 
R/ip )■ The space complexity in bits is 

6(a 2n+A k 2n+3 n 2n+3 v 2 p) 

plus the contribution from operations outside of R/(p N ). 

Proof. To compute the complexity first observe that t = \p 2 N/ (p— l) 2 ] — 1 < 
AN. Thus t, N = 0(akn) in the algorithm. Also t = 0(pN) = 0(pt). Now 

W = #(tA) 

< h\V(tA) + h 
= n\t fl V{A) + h 
= 0(vt n+1 ). 

Here we use the Blichfeldt bound #(P) < m\V {P)+m for any m-dimensional 
polytope P from [9, page 144], the fact V(tA) = t n V(A) since A is h- 
dimensional, and also that h < n + 1. Similarly W = #(tA) = 0(vt n+1 ) = 
0(vt n+1 p n+1 ). 

Thus from Lemma [33] the number of ring operations is 

0((vt n+1 p n+1 ) 2 (v + n) + (t n+1 v) 3 log (ak)) 

since \J\ < #(A) — 1 < v + n. Thus the bit complexity which arises from 
ring operations is by Lemma [5] 

0{((vt n+1 p n+1 ) 2 (v + n) + (t n+1 v) 3 log (ak))(paN log p) 2 ). 

Tidying up and ignoring logarithmic factors we get 

6{v 3 t 2n+2 p 2n+A a 2 N 2 + v 3 t 3n+3 p 2 a 2 N 2 ). 

The second term is dominant in all factors except p. For simplicity we take 
the estimate of 

0(v 3 t 3n+3 p M a 2 N 2 ). 
Now putting t,N = 0(ank) we get 

0(v 3 a 3n+7 k 3n+5 n 3n+5 p 2n+A ). 
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This is the total bit complexity which arises from "ring operations", as 
defined at the start of Section 16.3.11 There remains the contribution from 
computing the Teichmiiller liftings of the coefficients of / in Step 1, and also 
the precomputation required for the map r. By Lemma [5] this is easily seen 
to be absorbed in the above estimate. 

With regard to the space complexity, this is dominated by the space 
required to store the matrix An, which is 0(W 2 ) ring elements. Putting 
W = 0(v(ank) n+1 ) and using Lemma [5] gives us the result. □ 

One may replace n in the exponents in Proposition [35] by h — 1; how- 
ever, this only gives an improvement when the Newton polytope is not full- 
dimensional. 

6.3.3. Bit complexity arising from auxiliary operations. It remains to bound 
the complexity which arises from operations outside of the ring R/(p N ) in 
Steps 1 and 2. In Step 1 manipulation of exponents of polynomials will 
add an extra term 0(log(pNd)) to the running time, which can safely be 
ignored. 

In Step 2 one is required to find all lattice points which lie in tA, for t = 
O(ank). The complexity of this step will depend upon the input polytope 
A. For a "general" A one requires methods from computational convex 
geometry which are not in the spirit of the present exposition. Thus our 
total bit complexity estimate for Algorithm 1291 will just be as in Proposition 
1351 "plus the contribution from finding all lattice points in t A" . 

At this stage for simplicity we shall restrict to the choice of S = 62 in 
Definition IT2l Thus we take A = A2 as the convex hull in M n+1 of the 
origin and the n + 1 points 

(l,0,...,0),(l,d,0,...,0),...,(l,0,...,0,(i). 

For this case the required set of lattice points is 

{(r , n, . . . , r„) I n H h r n < dr < dt} 

and so no computations are required here. Also, now v equals d n and directly 
from Proposition [35] we get the following result. 

Proposition 36. Let Algorithm \29\ be exactly as Algorithm \2Q only with 
input restricted to the choice of polytope A = A2 described in the preceding 
paragraph. The total running time of Alaorithm\29\ is 

d{a 3n+7 k 3n+5 n 3n+5 d 3n P 2n+4 ) 

bit operations. The total space complexity in bits is 

6(a 2n+4 k 2n+3 n 2n+3 d 2n p). 

We shall use this restricted version of Algorithm [29] in the proofs of our 
main results in the next section. 
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6.4. Proofs of the results in the Introduction. To compute the number 
of points on the affine variety defined by a polynomial / one simply uses the 
torus decomposition of F" fc . Specifically, for any subset S C {1, 2, . . . , n} let 
Gf, denote the set of points 

{(xi, . . . , x n ) j Xi G ¥ q k, Xi = 0»i6S}. 

Denote by f s the polynomial obtained from / by setting to zero all inde- 
terminates Xi which occur in / for i £ S. Denote by Nj? the number of 
solutions of f s = in the torus Gf of dimension n— \S\. Then = J2s 
where the sum is over all subsets of {1, 2, . . . , n}. Each number Nj? can be 
computed using Algorithm 1291 . (If some f s is identically zero or has de- 
gree then Nj? = (q k — l) n H 5 l or 0, respectively, and Algorithm 1291 is 
not required!) Thus by 2™ applications of this algorithm we obtain Nk as 
desired. 

Now to obtain the whole zeta function Z(f /¥ q ) it suffices to count Nk for 
all k = 1, ... , deg(r) + deg(s), where 

Z(f/¥ q ){T) = ^| 

with r and s coprime polynomials in 1 + TZ[T]. More precisely, it is enough 
to know upper bounds deg(r) < D\ and deg(s) < D2, and compute Nk for 
k = 1, . . . , D\ + D%. Then use the linear algebra method described prior to 
[22^ Corollary 2.8], which we now supplement with further details. 

Let u{T) = 1 + Yld=i u i^ % an d v(T) = 1 + Yld=i v i^ % have indeterminate 
coefficients. Write Z(f /¥ q )(T) = 1 + z\T + Z2T 2 + . . . . This power series 
has non-negative integer coefficients and it can easily be computed modulo 
t d 1+ d 2 +i given JVfc for fc = 1, . . . + D 2 . The equation v(T)Z(f/¥ q ) 
11 [ I ) mod T Dl+D2+1 defines a linear system Ax = y where A is a known 
square D\ + D2 integer matrix and y a known integer column vector. The 
entries in A and y are just coefficients from the power series Z{f /¥ q ) mod 
X Dl+r>2+1 . Let b be a bound on their bit length. The unknown entries in x 
are the coefficients of u and v. By [22J the set of all solutions to this system 
consists of precisely those vectors x derived by specialising the coefficients 
of u(T) and v(T) to equal those of d(T)r(T) and d(T)s(T), respectively, for 
some d(T) G 1 +TZ[T] with degree at most min(L>i — deg(r), D2 — deg(s)). 
In particular, the system has a unique solution (i.e. det(^4) 7^ 0) if and only 
if either deg(r) = D\ or deg(s) = D2 (or both). The determinant det(^4) 
can be computed using the small primes method in [71 Algorithm 5.10] 
in a number of bit operations bounded by 0((D\ + D2) 4 b 2 ) (TJ Theorem 
5.12]. Let us assume now that det(^4) 7^ and so the system has a unique 
solution, namely the unknown vector containing the integer coefficients of 
r and s. Let B be a bound on the bit length of these coefficients. Find 
the unique solution to the linear system modulo enough small primes which 
do not divide det(^4), and recover this integer solution using the Chinese 
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remainder theorem. Precisely, work modulo a collection of such primes 
whose product has bit length greater than B. This second step requires 
&((Di + D 2 ) 4 B 2 ) bit operations using Gaussian elimination (this can be 
improved with a Pade Approximation algorithm Section 5.9]). Values for 
b and B may be deduced from the bound JVjfc < q nk . Specifically, one may 
show from this that the absolute values of the reciprocal zeros of r and s are 
all < q n , and so we can take B = 0{{D\ + D2)n\ogq). Also, we can take 
b = 0{{D\ + D2) 2 n log q). If in the above we find det(A) = then we must 
have that deg(r) < D\ and deg(s) < D 2 . In this case one must first reduce 
D 2 , say, and compute determinants until the correct value D 2 = deg(s) is 
found (then det(vl) ^ and the above method works). 

By the refinement of Bombieri's degree bound [3] from [2[ Equation 
(1.13)], the "total degree" deg(r) + deg(s) is bounded by 2 n+1 6 n+1 (n + 
l)!^(Ai), where Ax is the polytope in IR n+1 derived from the Newton 
polytope of / (see the paragraph following Definition I12p . Certainly (n + 
l)!V(Ai) < d n . Hence we may take D 1 ,D 2 = 2 4n+4 d n and so D x + D 2 = 

Theorem 37. Let f be a polynomial in n variables of total degree d > 
over ¥ q , where q = p a . The full zeta function Z(f/W q ) can be computed 
deterministically in 

0(2 13n2 a 3n+7 d 3n2 + 9n p 2n + 4s ) 

bit operations. (Here we use soft- Oh notation which ignores logarithmic 
factors, as defined at the start of Section Iff. 31 ) 

Proof. From Proposition [36] and the torus decomposition method, the bit 
complexity of computing for k = 1, . . . , 2 4n+5 d n is 

2 4n+5 ( f l 

0{ {a 3n+7 k 3n+5 n 3n+5 d 3n p 2n+4 }2 n ). 
k=l 

(The contribution from recovering Z(f /¥ q ) from the is absorbed in this 
estimate.) Tidying up the factor in n we get the claimed result. □ 

since we may assume that d > 1 we have 2 n = 0(d n ), and Theorem ffl 
now follows. 

The proof of Corollary [2] was explained in the introduction, and we finish 
with some comments on Corollary [31 By Weil's theorem, the zeta function 
of the smooth projective curve V from Corollary [3J is of the form 

for some polynomial P(T) whose reciprocal roots have complex absolute 
value q 1 ! 2 . Since the (possibly singular) affine curve V and the smooth 
projective curve V differ in only finitely many closed points, we deduce that 
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the zeta function of V is of the form 

z(vm _ p_m, 

where Q(T) is a rational function whose zeros and poles are roots of unity. 
This zeta function, and in particular the rational function P(T)Q(T), may 
be computed within the time bound in Corollary [3] by Corollary [2j In terms 
of the pure weight decomposition [22], the polynomial P{T) (respectively, 
Q(T)) is exactly the pure weight 1 (respectively, weight 0) part of the prod- 
uct P(T)Q(T), and can be recovered quickly from P(T)Q(T) via the LLL 
polynomial factorization algorithm. In our current special case, one can pro- 
ceed directly without using the LLL-factorization algorithm. By repeatedly 
removing the common factor of the numerator of P(T)Q(T) with T s — 1 for 
4>(s) (Euler totient function) not greater than the total degree of P(T)Q(T), 
the desired polynomial P(T) can be recovered. The order of the group of 
rational points on V is simply -P(l), see |22| . Thus for fixed dimension and 
finite field one can compute the order of the group of rational points on the 
Jacobian of a smooth projective curve in time polynomial in the degree d. 
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